Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Comprehensive Circuit Breaker User Guide for KMesh Kernel-Native Implementation #110

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

DeshDeepakKant
Copy link
Contributor

Description

During the Open Source Promotion Plan (OSPP), KMesh has successfully implemented a circuit breaker mechanism in Kernel-Native mode. However, the current documentation lacks a comprehensive user guide to help developers understand and utilize this feature effectively.

Objectives

  • Create comprehensive documentation
  • Explain circuit breaker configuration
  • Provide implementation details
  • Include usage examples and best practices

Key Components

  • Technical overview
  • Configuration guide
  • Code snippets
  • Troubleshooting tips

@kmesh-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign lizhencheng9527 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

netlify bot commented Jan 21, 2025

Deploy Preview for kmesh-net ready!

Name Link
🔨 Latest commit 4bc5264
🔍 Latest deploy log https://app.netlify.com/sites/kmesh-net/deploys/6792c183236cc2000812b9c5
😎 Deploy Preview https://deploy-preview-110--kmesh-net.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@@ -0,0 +1,253 @@
---
draft: false
linktitle: Circuit Breaker
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls add Chinese guide too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DeshDeepakKant Is not from China, he does not speak Chinese

spec:
containers:
- name: service
image: your-service-image
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should provide a runnable application, please refer to other guide


```bash
# Install hey load testing tool
go install github.com/rakyll/hey@latest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go install github.com/rakyll/hey@latest

# Generate load
hey -n 1000 -c 50 http://sample-service/endpoint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how ciercuit breaking work as this is run from a binary, how is the client managed by kmesh


```bash
# View KMesh circuit breaker logs
kubectl logs -n kmesh -l app=kmesh circuit-breaker
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

-n kmesh-system

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hzxuzhonghu Thank you for the review. I have made the changes and pushed them. The updates are running smoothly and have been verified on my end. If any further adjustments are needed, please let me know.

Additionally, I've attached some terminal logs for your reference. Could you please confirm if everything looks correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot from 2025-01-23 19-37-36
Screenshot from 2025-01-23 19-37-18

@LiZhenCheng9527
Copy link
Contributor

Can you be sure that what you are submitting is verified by yourself?
This issue makes us consider whether to assign the LFX project to you.

@DeshDeepakKant DeshDeepakKant force-pushed the docs/circuit-breaker branch 2 times, most recently from 5295889 to 2545431 Compare January 23, 2025 22:23
…uide

- Implement detailed documentation for Circuit Breaker feature
- Include technical implementation details
- Provide configuration examples and best practices
- Cover troubleshooting and monitoring aspects

Signed-off-by: Desh Deepak Kant <deshdeepakkant@gmail.com>
Closes kmesh-net#103
Signed-off-by: DeshDeepakKant <deshdeepakkant@gmail.com>
Signed-off-by: DeshDeepakKant <deshdeepakkant@gmail.com>
Signed-off-by: DeshDeepakKant <deshdeepakkant@gmail.com>
@kmesh-bot
Copy link
Collaborator

Keywords which can automatically close issues and at(@) or hashtag(#) mentions are not allowed in commit messages.

The list of commits with invalid commit messages:

  • 949453f docs(circuit-breaker): Add comprehensive KMesh Circuit Breaker user guide

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

fortio load -c 2 -qps 20 -t 30s http://test-service
```

### Analyzing Results
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you show the results as well?
Logs or metrics are fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LiZhenCheng9527 Thank you for your response. Could you kindly clarify what you mean by "results"? Are you referring to the terminal result logs to include in the user guide?

For your convenience, I have attached the terminal log below for reference. Please let me know if this aligns with what you were looking for or if there are any additional details or metrics you'd like me to provide.

desh@pop-os:~/kt$ # Heavy load to trigger circuit breaker
kubectl exec -it deploy/fortio -- \
fortio load -c 5 -qps 100 -t 30s http://test-service

# Verify circuit breaker status
kubectl get destinationrule test-circuit-breaker -o yaml

# Simulate service failure
kubectl scale deployment test-service --replicas=0
15:15:39.852 r1 [INF] scli.go:122> Starting, command="Φορτίο", version="1.66.5 h1:WTJzTGOA12YWZSM5g43602lH+GOsmP3eKHXLnuRW4vs= go1.22.7 amd64 linux", go-max-procs=12
Fortio 1.66.5 running at 100 queries per second, 12->12 procs, for 30s: http://test-service
15:15:39.853 r1 [INF] httprunner.go:121> Starting http test, run=0, url="http://test-service", threads=5, qps="100.0", warmup="parallel", conn-reuse=""
Starting at 100 qps with 5 thread(s) [gomax 12] for 30s : 600 calls each (total 3000)
15:16:09.858 r84 [INF] periodic.go:851> T003 ended after 30.000992951s : 600 calls. qps=19.99933805457598
15:16:09.858 r83 [INF] periodic.go:851> T002 ended after 30.000983273s : 600 calls. qps=19.99934450615098
15:16:09.858 r82 [INF] periodic.go:851> T001 ended after 30.001020274s : 600 calls. qps=19.99931984046497
15:16:09.858 r85 [INF] periodic.go:851> T004 ended after 30.001037727s : 600 calls. qps=19.99930820592978
15:16:09.858 r81 [INF] periodic.go:851> T000 ended after 30.001090238s : 600 calls. qps=19.99927320107946
Ended after 30.001156064s : 3000 calls. qps=99.996
15:16:09.858 r1 [INF] periodic.go:581> Run ended, run=0, elapsed=30001156064, calls=3000, qps=99.99614660182583
Sleep times : count 2995 avg 0.049068316 +/- 0.0003238 min 0.048044947 max 0.049896243 sum 146.959607
Aggregated Function Time : count 3000 avg 0.00034710756 +/- 9.911e-05 min 0.000126072 max 0.000813633 sum 1.04132267
# range, mid point, percentile, count
>= 0.000126072 <= 0.000813633 , 0.000469852 , 100.00, 3000
# target 50% 0.000469738
# target 75% 0.000641685
# target 90% 0.000744854
# target 99% 0.000806755
# target 99.9% 0.000812945
Error cases : no data
# Socket and IP used for each connection:
[0]   1 socket used, resolved to 10.96.230.153:80, connection timing : count 1 avg 0.000159577 +/- 0 min 0.000159577 max 0.000159577 sum 0.000159577
[1]   1 socket used, resolved to 10.96.230.153:80, connection timing : count 1 avg 0.000138426 +/- 0 min 0.000138426 max 0.000138426 sum 0.000138426
[2]   1 socket used, resolved to 10.96.230.153:80, connection timing : count 1 avg 0.000130691 +/- 0 min 0.000130691 max 0.000130691 sum 0.000130691
[3]   1 socket used, resolved to 10.96.230.153:80, connection timing : count 1 avg 0.000233499 +/- 0 min 0.000233499 max 0.000233499 sum 0.000233499
[4]   1 socket used, resolved to 10.96.230.153:80, connection timing : count 1 avg 0.000172922 +/- 0 min 0.000172922 max 0.000172922 sum 0.000172922
Connection time histogram (s) : count 5 avg 0.000167023 +/- 3.646e-05 min 0.000130691 max 0.000233499 sum 0.000835115
# range, mid point, percentile, count
>= 0.000130691 <= 0.000233499 , 0.000182095 , 100.00, 5
# target 50% 0.000169244
# target 75% 0.000201372
# target 90% 0.000220648
# target 99% 0.000232214
# target 99.9% 0.00023337
Sockets used: 5 (for perfect keepalive, would be 5)
Uniform: false, Jitter: false, Catchup allowed: true
IP addresses distribution:
10.96.230.153:80: 5
Code 200 : 3000 (100.0 %)
Response Header Sizes : count 3000 avg 238 +/- 0 min 238 max 238 sum 714000
Response Body/Total Sizes : count 3000 avg 853 +/- 0 min 853 max 853 sum 2559000
All done 3000 calls (plus 5 warmup) 0.347 ms avg, 100.0 qps
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"networking.istio.io/v1alpha3","kind":"DestinationRule","metadata":{"annotations":{},"name":"test-circuit-breaker","namespace":"default"},"spec":{"host":"test-service","trafficPolicy":{"connectionPool":{"http":{"http1MaxPendingRequests":1,"maxRequestsPerConnection":1}},"outlierDetection":{"baseEjectionTime":"30s","consecutive5xxErrors":3,"interval":"5s"}}}}
  creationTimestamp: "2025-01-23T13:22:14Z"
  generation: 1
  name: test-circuit-breaker
  namespace: default
  resourceVersion: "64439"
  uid: 07ed1da0-79c7-45a8-81b8-a7912e6d1568
spec:
  host: test-service
  trafficPolicy:
    connectionPool:
      http:
        http1MaxPendingRequests: 1
        maxRequestsPerConnection: 1
    outlierDetection:
      baseEjectionTime: 30s
      consecutive5xxErrors: 3
      interval: 5s
deployment.apps/test-service scaled
desh@pop-os:~/kt$ # Restore service
kubectl scale deployment test-service --replicas=1

# Test recovery
kubectl exec -it deploy/fortio -- \
fortio load -c 2 -qps 20 -t 30s http://test-service
deployment.apps/test-service scaled
15:16:13.095 r1 [INF] scli.go:122> Starting, command="Φορτίο", version="1.66.5 h1:WTJzTGOA12YWZSM5g43602lH+GOsmP3eKHXLnuRW4vs= go1.22.7 amd64 linux", go-max-procs=12
Fortio 1.66.5 running at 20 queries per second, 12->12 procs, for 30s: http://test-service
15:16:13.095 r1 [INF] httprunner.go:121> Starting http test, run=0, url="http://test-service", threads=2, qps="20.0", warmup="parallel", conn-reuse=""
15:16:13.097 r52 [ERR] http_client.go:954> Unable to connect, dest={"IP":"10.96.230.153","Port":80,"Zone":""}, err="dial tcp 10.96.230.153:80: connect: connection refused", numfd=7, thread=1, run=0
15:16:13.097 r51 [ERR] http_client.go:954> Unable to connect, dest={"IP":"10.96.230.153","Port":80,"Zone":""}, err="dial tcp 10.96.230.153:80: connect: connection refused", numfd=6, thread=0, run=0
Aborting because of error -1 for http://test-service (0 bytes)
command terminated with exit code 1
desh@pop-os:~/kt$ # View test results
kubectl logs deploy/fortio
Found 2 pods, using pod/fortio-deploy-5669d4866b-rwlzj
{"ts":1737903211.076985,"level":"info","r":1,"file":"updater.go","line":50,"msg":"Configmap flag value watching on /etc/fortio"}
{"ts":1737903211.077518,"level":"crit","r":1,"file":"scli.go","line":83,"msg":"Unable to watch config/flag changes in /etc/fortio: dflag: error initializing fsnotify watcher"}
{"ts":1737903211.077641,"level":"info","r":1,"file":"scli.go","line":122,"msg":"Starting","command":"Φορτίο","version":"1.66.5 h1:WTJzTGOA12YWZSM5g43602lH+GOsmP3eKHXLnuRW4vs= go1.22.7 amd64 linux","go-max-procs":12}
{"ts":1737903211.079770,"level":"info","r":1,"msg":"Fortio 1.66.5 tcp-echo server listening on tcp [::]:8078"}
{"ts":1737903211.079867,"level":"info","r":1,"msg":"Fortio 1.66.5 udp-echo server listening on udp [::]:8078"}
{"ts":1737903211.079908,"level":"info","r":1,"msg":"Fortio 1.66.5 grpc 'ping' server listening on tcp [::]:8079"}
{"ts":1737903211.080657,"level":"info","r":1,"msg":"Fortio 1.66.5 https redirector server listening on tcp [::]:8081"}
{"ts":1737903211.082276,"level":"info","r":1,"msg":"Fortio 1.66.5 http-echo server listening on tcp [::]:8080"}
{"ts":1737903211.082363,"level":"info","r":1,"msg":"Data directory is /var/lib/fortio"}
{"ts":1737903211.082392,"level":"info","r":1,"msg":"REST API on /fortio/rest/run, /fortio/rest/status, /fortio/rest/stop, /fortio/rest/dns"}
	 UI started - visit:
		http://localhost:8080/fortio/
	 (or any host/ip reachable on this server)
{"ts":1737903211.083216,"level":"info","r":1,"msg":"Debug endpoint on /debug, Additional Echo on /debug/echo/, Flags on /fortio/flags, and Metrics on /debug/metrics"}
{"ts":1737903211.083302,"level":"info","r":1,"file":"fortio_main.go","line":307,"msg":"All fortio 1.66.5 h1:WTJzTGOA12YWZSM5g43602lH+GOsmP3eKHXLnuRW4vs= go1.22.7 amd64 linux servers started!"}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If circuit breaker is configured, then when multiple links access the service, the access will fail.
You can start by providing the results of a fortio without a circuit breaker configured.

IP addresses distribution: 10.96.230.153:80: 5 
Code 200 : 3000 (100.0 %)

Then provide the results for a fortio with a circuit breaker configured.

IP addresses distribution: 10.96.230.153:80: 5 
Code 200 : 1914 (63.8%) 
Code 503 : 1086 (36.2%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants